DATA Step or PROC? It depends...

3

As you know, almost every SAS programming problem has many very different solutions. I’m going to solve a very simple problem using two different approaches.

The problem: Compute the sum of integers from 1 to 1,000,000.

I bet most of you thought of a solution almost immediately. Let me guess that you thought of one of the solutions shown below:

Solution 1

options fullstimer; ❶
 
data _null_; ❷
   do i = 1 to 1000000;
      Sum + i; ❸
   end;
   file print; ❹
   put Sum=;
run;

The option fullstimer will give you more complete timing information.

❷ To be more efficient use data _null_.

❸ The SUM statement does several things: 1) the variable Sum is retained (not set back to missing) for each iteration of the DATA Step. 2) Sum it initialized at zero. 3) If you had an expression instead of the constant (1) missing values would be ignored.

❹ Use FILE PRINT to send the output to the output window instead of the default location, the LOG.

Solution 2

data Integers; ❶
   do i = 1 to 10000000;
      output; ❷
   end;
run;
 
title "Sum of Integers";
proc means data=Integers n sum; ❸
   var i;
run;

Create a data set called Integers.

❷ Output an observation for each iteration of the DATA Step.  Note that the OUTPUT statement is inside the DO Loop.

❸ Use PROC MEANS to compute the sum.

Although both programs work, there is a difference in CPU time. Does that mean you should always seek a DATA Step solution? Not really. It depends on several factors, such as how often the program is to be run and which method you feel most comfortable with.

Here is a partial listing of the SAS Log showing timing information:

NOTE: 1 lines were written to file PRINT.
NOTE: DATA statement used (Total process time):
      real time           1.00 seconds
      user cpu time       0.21 seconds
      system cpu time     0.32 seconds
      memory              7875.03k
      OS Memory           16876.00k
      Timestamp           02/16/2023 08:23:18 AM
      Step Count           1  Switch Count  0

NOTE: The data set WORK.INTEGERS has 10000000 observations and 1
      variables.
NOTE: DATA statement used (Total process time):
      real time           0.21 seconds
      user cpu time       0.20 seconds
      system cpu time     0.00 seconds
      memory              410.03k
      OS Memory           17392.00k
      Timestamp           02/16/2023 08:23:18 AM
      Step Count           2  Switch Count  0
NOTE: There were 10000000 observations read from the data set
      WORK.INTEGERS.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.25 seconds
      user cpu time       1.07 seconds
      system cpu time     0.03 seconds
      memory              8471.84k
      OS Memory           25116.00k
      Timestamp           02/16/2023 08:23:19 AM
      Step Count           3  Switch Count  0

Do you care about CPU time? Unless this is a production program, I think you should program in a way that is most comfortable (unless you are a compulsive programmer and want the “best” program). By the way, if you remove the FILE PRINT statement from solution 1, the System CPU time is 0.0.  I guess there is some overhead to sending the results to your output device.

I’m interested in what your first instinct was when you read the problem. One of these two, or something else. Please post your comments below.

LEARN MORE | Ron Cody's books on Amazon
Share

About Author

Ron Cody

Private Consultant

Dr. Ron Cody was a Professor of Biostatistics at the Rutgers Robert Wood Johnson Medical School in New Jersey for 26 years. During his tenure at the medical school, he taught biostatistics to medical students as well as students in the Rutgers School of Public Health. While on the faculty, he authored or co-authored over a hundred papers in scientific journals. His first book, Applied Statistics and the SAS Programming Language, was first published by Prentice Hall in 1985 and is now in its fifth edition. Since then, he has published over a dozen books on SAS programming and statistical analysis using SAS. His latest book, A Gentle Introduction to Statistics Using SAS Studio was published this year. Ron has presented numerous papers at SAS Global forums, regional conferences, as well as local user groups. He is presently a contract instructor for SAS Institute and continues to write books on SAS and statistical topics.

3 Comments

  1. Wow, I had to Google that one. Never came across that formula before. However, I hope you see where I was going in this blog. Best, Ron

  2. Hello Ron, I would say suggest:

    data _null_;
    sum = 500000*1000001;
    put sum=;
    run;

    But maybe this one is out of competition 🙂

    Eric

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top